Schedule
April 26, 2026 - Rio de Janeiro, Brazil
Location: Room 211, Riocentro Convention and Event Center
All accepted papers can be found at https://openreview.net/group?id=ICLR.cc/2026/Workshop/AFAA&referrer=%5BHomepage%5D(%2F)
Location: Room 211, Riocentro Convention and Event Center
All accepted papers can be found at https://openreview.net/group?id=ICLR.cc/2026/Workshop/AFAA&referrer=%5BHomepage%5D(%2F)
Fairness Failure Modes of Multimodal LLMs
Canyu Chen, Anglin Cai, Joan Nwatu, Jianshu Zhang, Yale Li, Han Liu, Jessica Hullman, Rada Mihalcea, Kathleen McKeown, Manling Li
Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs
Edie Pearman, Sophia Osborne, Mira Kandlikar-Bloch, Mina Arzaghi, Florian Carichon, Golnoosh Farnadi
Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations
Preethi Seshadri, Samuel Cahyawijaya, Ayomide Odumakinde, Sameer Singh, Seraphina Goldfarb-Tarrant
Operationalizing Fairness in Text-to-Image Models: A Survey of Bias, Fairness Audits and Mitigation Strategies
Megan Smith, Venkatesh Thirugnana Sambandham, Florian Richter, Matthias Uhl, Laura Crompton, Torsten Schön
When AI Describes Race? Unveiling Racial Bias in Vision-Language Models in Brazilian People
Leodécio Braz da Silva Segundo, Marcos M. Raimundo
Moral Preferences of LLMs Under Directed Contextual Influence
Phil Blandfort, Tushar Karayil, Urja Pawar, Alex McKenzie, Robert Graham, Dmitrii Krasheninnikov
MEMORIES THAT DISCRIMINATE: DETECTING AND CORRECTING BIAS IN PERSONALIZED HIRING AGENTS
Himanshu Gharat, Himanshi Agrawal, Gourab K Patro
Scalable Intersectional Bias Auditing in Vision-Language Models through Combinatorial Interaction Testing
Heejin Bin, Junyoung Choi, JangHyun Kim, Seungjae Kim, Shin Yoo
Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths
Mind the Gap: Evaluating Model- and Agentic-Level Vulnerabilities in LLMs with Action Graphs
Ilham Wicaksono, Zekun Wu, Rahul Patel, Theo King, Adriano Koshiyama, Philip Colin Treleaven
OC-PRM: Overcredit-Contrastive Training for Precision-First Process Reward Models
Aakriti Agrawal, Souradip Chakraborty, Armin Saghafian, Nihal Sharma, Rizal Fathony, Nam H Nguyen, C. Bayan Bruss, Amrit Singh Bedi, Furong Huang
Red Teaming the Rules: An Adversarial Approach to Legal Alignment
Rui-Jie Yew, Greg Demirchyan
Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation
Sola Kim, Dongjune Chang, Jieshu Wang
GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory
Pepijn Cobben, X. Angelo Huang, Thao Amelia Pham, Isabel Dahlgren, Terry Jingchen Zhang, Zhijing Jin
Automatically Finding Reward Model Biases
Atticus Wang, Iván Arcuschin, Arthur Conmy
Improving Fairness via Noise Injection in Vision Transformers
Qiaoyue Tang, Sepidehsadat Hosseini, Mengyao Zhai, Thibaut Durand, Greg Mori
Learning to Be Fair: Modeling Fairness Dynamics by Simulating Moral-Based Multi-Agent Resource Allocation
Haiyan Feng, Yuqiao Du, Huacong Tang, Junjie Liao, Yipeng Kang, Mingjie Bi, Fangwei Zhong, Zhou Ziheng
MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment
Andor Vári-Kakas, Ji Won Park, Natasa Tagasovska
Navigating the Rashomon Set: The Impact of Score Distributions and Decision Thresholds on Model Agreement
Giovani Valdrighi, Marcos M. Raimundo
Robust AI Evaluation through Maximal Lotteries
Hadi Khalaf, Serena Lutong Wang, Daniel Halpern, Itai Shapira, Flavio Calmon, Ariel D. Procaccia
Verifying Alignment Constraints Under Finite-Sample Uncertainty in Composite-Data Regimes
Blossom Metevier, Max Springer, Bohdan Turbal, Aleksandra Korolova
Reward-free Alignment for Conflicting Objectives
Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin
Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search
Manos Plitsis, Giorgos Bouritsas, Vassilis Katsouros, Yannis Panagakis
State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition
Bryan Cheng, Austin Jin, Jasper Zhang
Evaluating black-box vulnerabilities with Wasserstein-constrained data perturbations
Adriana Laurindo Monteiro, Jean-Michel Loubes
Distortion of AI Alignment Revisited: RLHF Is a Decent Utilitarian Aligner
Kazusato Oko, Annie S Ulichney, Nika Haghtalab, Han Bao
FairMed-VLM: Toward Equitable Medical Di- agnosis with Vision–Language Models
zihao chang, Ruixiang Zhu, Daochu Li, Chaozhi Geng, Siqi Chen
Probing Implicit Bias Risk Framing in Language Models
Rishi Kalra, Andrea Dhelpra, Seonglae Cho, Adriano Koshiyama
Disparities in Negation Understanding Across Languages in Vision-Language Models
Charikleia Moraitaki, Skyler Pulling, Sarah Pan, Gwendolyn Flusche, Kumail Alhamoud, Marzyeh Ghassemi
Procedural Fairness Failures in RLHF from Preference Averaging
M P V S GOPINADH, Karthik Kamuju, Kummari Avinash, Muppana John Joshua, Srinivasa Raju Rudraraju
Long-term Fairness with Selective Labels
Giovani Valdrighi, Isabel Valera, Marcos M. Raimundo
Differential Adjusted Parity for Learning Fair Representations
Bucher Sahyouni, Matthew James Vowels, Liqun Chen, Simon Hadfield
Metanetworks as Regulatory Operators: Learning to Edit for Requirement Compliance
Ioannis Kalogeropoulos, Giorgos Bouritsas, Yannis Panagakis
Cross-Linguistic Failures and Disparities in LLM Medical Reasoning: Analyzing XMedBench and CrossMMLU Across Western and Non-Western Languages
Rehan Nazeem, Akira Hoque, Vedesh Ray Peddoddi, Tim Liu, Kevin Zhu
SOMnibus: Recovering Underlying Sensitive Attributes with Self-Organizing Maps
Joseph Charles Bingham, Netanel Arussy, Dvir Aran
THE PERSONALIZATION TRAP: HOW USER MEMORY ALTERS EMOTIONAL REASONING IN LLMS
Weijie Xu, Xi Fang, Yuchong Zhang, Stephanie Eckman, Scott Nickleach, Chandan K. Reddy